This exercise uses data set
LakeHurongiving the level of Lake Huron from 1875–1972.
- Convert the data to a tsibble object using the
as_tsibble()function.- Fit a piecewise linear trend model to the Lake Huron data with a knot at 1920 and an ARMA error structure.
- Forecast the level for the next 30 years. Do you think the extrapolated linear trend is realistic?
huron <- as_tsibble(LakeHuron)
fit <- huron |>
model(ARIMA(value ~ trend(knot = 1920)))
report(fit)
## Series: value
## Model: LM w/ ARIMA(2,0,0) errors
##
## Coefficients:
## ar1 ar2 trend(knot = 1920)trend trend(knot = 1920)trend_46
## 0.9628 -0.3107 -0.0572 0.0633
## s.e. 0.0973 0.0983 0.0161 0.0265
## intercept
## 580.9391
## s.e. 0.5124
##
## sigma^2 estimated as 0.4594: log likelihood=-98.86
## AIC=209.73 AICc=210.65 BIC=225.24
fit |>
forecast(h = 30) |>
autoplot(huron) + labs(y = "feet")
It seems unlikely that there was an increasing trend from 1973 to 2002, but the prediction intervals are very wide so they probably capture the actual values. Historical data on the level of Lake Huron can be obtained from the NOAA.
Repeat Exercise 4 from Section 7.10, but this time adding in ARIMA errors to address the autocorrelations in the residuals.
- How much difference does the ARIMA error process make to the regression coefficients?
fit <- souvenirs |>
mutate(festival = month(Month) == 3 & year(Month) != 1987) |>
model(
reg = TSLM(log(Sales) ~ trend() + season() + festival),
dynreg = ARIMA(log(Sales) ~ trend() + season() + festival)
)
tidy(fit) |> print(n=50)
## # A tibble: 31 × 6
## .model term estimate std.error statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 reg (Intercept) 7.62 0.0742 103. 4.67e-78
## 2 reg trend() 0.0220 0.000827 26.6 2.32e-38
## 3 reg season()year2 0.251 0.0957 2.63 1.06e- 2
## 4 reg season()year3 0.266 0.193 1.38 1.73e- 1
## 5 reg season()year4 0.384 0.0957 4.01 1.48e- 4
## 6 reg season()year5 0.409 0.0957 4.28 5.88e- 5
## 7 reg season()year6 0.449 0.0958 4.69 1.33e- 5
## 8 reg season()year7 0.610 0.0958 6.37 1.71e- 8
## 9 reg season()year8 0.588 0.0959 6.13 4.53e- 8
## 10 reg season()year9 0.669 0.0959 6.98 1.36e- 9
## 11 reg season()year10 0.747 0.0960 7.79 4.48e-11
## 12 reg season()year11 1.21 0.0960 12.6 1.29e-19
## 13 reg season()year12 1.96 0.0961 20.4 3.39e-31
## 14 reg festivalTRUE 0.502 0.196 2.55 1.29e- 2
## 15 dynreg ar1 0.556 0.179 3.11 2.53e- 3
## 16 dynreg ma1 -0.129 0.192 -0.670 5.05e- 1
## 17 dynreg ma2 0.340 0.114 2.99 3.68e- 3
## 18 dynreg trend() 0.0226 0.00150 15.1 1.17e-25
## 19 dynreg season()year2 0.252 0.0574 4.38 3.40e- 5
## 20 dynreg season()year3 0.297 0.118 2.51 1.42e- 2
## 21 dynreg season()year4 0.377 0.0729 5.17 1.56e- 6
## 22 dynreg season()year5 0.400 0.0789 5.07 2.30e- 6
## 23 dynreg season()year6 0.438 0.0817 5.36 7.19e- 7
## 24 dynreg season()year7 0.598 0.0827 7.23 2.04e-10
## 25 dynreg season()year8 0.573 0.0821 6.98 6.45e-10
## 26 dynreg season()year9 0.651 0.0799 8.16 2.94e-12
## 27 dynreg season()year10 0.725 0.0746 9.71 2.18e-15
## 28 dynreg season()year11 1.18 0.0629 18.7 1.14e-31
## 29 dynreg season()year12 1.93 0.0599 32.2 5.41e-49
## 30 dynreg festivalTRUE 0.461 0.119 3.86 2.19e- 4
## 31 dynreg intercept 7.60 0.0857 88.7 8.60e-85
The coefficients are all relatively close.
- How much difference does the ARIMA error process make to the forecasts?
future_souvenirs <- new_data(souvenirs, n = 24) |>
mutate(festival = month(Month) == 3)
fit |>
forecast(new_data = future_souvenirs) |>
autoplot(souvenirs, level=95)
The forecasts are also extremely close.
- Check the residuals of the fitted model to ensure the ARIMA process has adequately addressed the autocorrelations seen in the
TSLMmodel.
fit |>
select(dynreg) |>
gg_tsresiduals()
These look fine.
Repeat the daily electricity example, but instead of using a quadratic function of temperature, use a piecewise linear function with the “knot” around 25 degrees Celsius (use predictors
Temperature&Temp2). How can you optimize the choice of knot?
vic_elec_daily <- vic_elec |>
filter(year(Time) == 2014) |>
index_by(Date = date(Time)) |>
summarise(
Demand = sum(Demand) / 1e3,
Temperature = max(Temperature),
Holiday = any(Holiday)
) |>
mutate(
Temp2 = I(pmax(Temperature - 25, 0)),
Day_Type = case_when(
Holiday ~ "Holiday",
wday(Date) %in% 2:6 ~ "Weekday",
TRUE ~ "Weekend"
)
)
First, we will leave the knot at 25 and find the best ARIMA model. This will take a while, but we only have to do it once.
vic_elec_daily |>
model(fit = ARIMA(Demand ~ Temperature + Temp2 + (Day_Type == "Weekday"))) |>
report()
## Series: Demand
## Model: LM w/ ARIMA(3,1,1)(2,0,0)[7] errors
##
## Coefficients:
## ar1 ar2 ar3 ma1 sar1 sar2 Temperature Temp2
## 0.7914 -0.0108 0.0110 -0.9762 0.1995 0.2936 -0.7025 4.4068
## s.e. 0.0730 0.0831 0.0568 0.0213 0.0542 0.0569 0.1744 0.3069
## Day_Type == "Weekday"TRUE
## 31.4648
## s.e. 1.3757
##
## sigma^2 estimated as 61.42: log likelihood=-1262.4
## AIC=2544.81 AICc=2545.43 BIC=2583.78
Now we will use that ARIMA(3,1,1)(2,0,0)[7] model and modify the knot until the AICc is optmized.
elec_model <- vic_elec_daily |>
mutate(Temp2 = I(pmax(Temperature - 28.26, 0))) |>
model(
fit = ARIMA(Demand ~ Temperature + Temp2 + (Day_Type == "Weekday") +
pdq(3, 1, 1) + PDQ(2, 0, 0),
)
)
glance(elec_model) |> pull(AICc)
## week
## 2534.222
The optimal knot (to 1 decimal place) is 28.26 degrees Celsius. However, the fit is very sensitive to the knot placement, with errors occurring with nearby knot locations.
report(elec_model)
## Series: Demand
## Model: LM w/ ARIMA(3,1,1)(2,0,0)[7] errors
##
## Coefficients:
## ar1 ar2 ar3 ma1 sar1 sar2 Temperature Temp2
## 0.7439 0.0444 0.0030 -0.9703 0.1938 0.3123 -0.0163 5.3497
## s.e. 0.0680 0.0739 0.0565 0.0248 0.0532 0.0567 0.1343 0.3315
## Day_Type == "Weekday"TRUE
## 31.0716
## s.e. 1.3595
##
## sigma^2 estimated as 59.55: log likelihood=-1256.8
## AIC=2533.6 AICc=2534.22 BIC=2572.57
augment(elec_model) |>
left_join(vic_elec_daily) |>
ggplot(aes(x = Temperature)) +
geom_point(aes(y = Demand)) +
geom_point(aes(y = .fitted), col = "red")
## Joining with `by = join_by(Date, Demand)`
augment(elec_model) |>
left_join(vic_elec_daily) |>
ggplot(aes(x = Demand)) +
geom_point(aes(y = .fitted)) +
geom_abline(aes(intercept = 0, slope = 1))
## Joining with `by = join_by(Date, Demand)`
elec_model |> gg_tsresiduals()
augment(elec_model) |>
features(.innov, ljung_box, dof = 6, lag = 21)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 fit 31.2 0.00827
The model fails the residual tests but the significant autocorrelations are relatively small, so it should still give reasonable forecasts. The residuals look like they have some heteroskedasticity, but otherwise look ok.
vic_next_day <- new_data(vic_elec_daily, 1) |>
mutate(
Temperature = 26,
Temp2 = I(pmax(Temperature - 28.1, 0)),
Day_Type = "Holiday"
)
forecast(elec_model, vic_next_day)
## # A fable: 1 x 7 [1D]
## # Key: .model [1]
## .model Date Demand .mean Temperature Temp2 Day_Type
## <chr> <date> <dist> <dbl> <dbl> <I<dbl>> <chr>
## 1 fit 2015-01-01 N(161, 60) 161. 26 0 Holiday
vic_elec_future <- new_data(vic_elec_daily, 14) |>
mutate(
Temperature = 26,
Temp2 = I(pmax(Temperature - 28.1, 0)),
Holiday = c(TRUE, rep(FALSE, 13)),
Day_Type = case_when(
Holiday ~ "Holiday",
wday(Date) %in% 2:6 ~ "Weekday",
TRUE ~ "Weekend"
)
)
forecast(elec_model, vic_elec_future) |>
autoplot(vic_elec_daily) + labs(y = "Electricity demand (GW)")
This exercise concerns
aus_accommodation: the total quarterly takings from accommodation and the room occupancy level for hotels, motels, and guest houses in Australia, between January 1998 and June 2016. Total quarterly takings are in millions of Australian dollars. a. Compute the CPI-adjusted takings and plot the result for each state
aus_accommodation <- aus_accommodation |>
mutate(
adjTakings = Takings / CPI * 100
)
aus_accommodation |>
autoplot(adjTakings)
- For each state, fit a dynamic regression model of CPI-adjusted takings with seasonal dummy variables, a piecewise linear time trend with one knot at 2008 Q1, and ARIMA errors.
fit <- aus_accommodation |>
model(
ARIMA(adjTakings ~ season() + trend(knot = yearquarter("2008 Q1")))
)
fit
## # A mable: 8 x 2
## # Key: State [8]
## State ARIMA(adjTakings ~ season() + trend(knot = year…¹
## <chr> <model>
## 1 Australian Capital Territory <LM w/ ARIMA(1,0,0) errors>
## 2 New South Wales <LM w/ ARIMA(1,0,0)(0,0,1)[4] errors>
## 3 Northern Territory <LM w/ ARIMA(0,0,1)(1,0,0)[4] errors>
## 4 Queensland <LM w/ ARIMA(1,0,0)(0,0,1)[4] errors>
## 5 South Australia <LM w/ ARIMA(1,0,0)(1,0,0)[4] errors>
## 6 Tasmania <LM w/ ARIMA(0,0,1)(1,0,0)[4] errors>
## 7 Victoria <LM w/ ARIMA(1,0,0)(0,0,1)[4] errors>
## 8 Western Australia <LM w/ ARIMA(1,0,0) errors>
## # ℹ abbreviated name:
## # ¹`ARIMA(adjTakings ~ season() + trend(knot = yearquarter("2008 Q1")))`
The seasonal dummy variable has not adequately handled the seasonality, so there are seasonal ARIMA components.
- Check that the residuals of the model look like white noise.
fit |>
filter(State == "Victoria") |>
gg_tsresiduals()
No apparent problems. Similar plots needed for the other states.
- Forecast the takings for each state to the end of 2017. (Hint: You will need to produce forecasts of the CPI first.)
# CPI forecasts
cpif <- aus_accommodation |>
model(ARIMA(CPI)) |>
forecast(h = 6) |>
as_tsibble() |>
select(Date, State, CPI = .mean)
fit |>
forecast(new_data = cpif) |>
mutate(Takings = adjTakings * CPI / 100)
## # A fable: 48 x 7 [1Q]
## # Key: State, .model [8]
## State .model Date adjTakings .mean CPI Takings
## <chr> <chr> <qtr> <dist> <dbl> <dbl> <dist>
## 1 Australian Capital Terr… "ARIM… 2016 Q3 N(62, 9.9) 61.6 109. N(67, 12)
## 2 Australian Capital Terr… "ARIM… 2016 Q4 N(59, 12) 58.9 110. N(65, 15)
## 3 Australian Capital Terr… "ARIM… 2017 Q1 N(59, 13) 59.0 110. N(65, 16)
## 4 Australian Capital Terr… "ARIM… 2017 Q2 N(59, 13) 59.4 111. N(66, 16)
## 5 Australian Capital Terr… "ARIM… 2017 Q3 N(61, 13) 60.9 111. N(68, 16)
## 6 Australian Capital Terr… "ARIM… 2017 Q4 N(59, 13) 58.8 112. N(66, 16)
## 7 New South Wales "ARIM… 2016 Q3 N(791, 1254) 791. 109. N(863, 1494)
## 8 New South Wales "ARIM… 2016 Q4 N(844, 1589) 844. 110. N(926, 1914)
## 9 New South Wales "ARIM… 2017 Q1 N(829, 1679) 829. 110. N(915, 2043)
## 10 New South Wales "ARIM… 2017 Q2 N(734, 1703) 734. 111. N(814, 2094)
## # ℹ 38 more rows
- What sources of uncertainty have not been taken into account in the prediction intervals?
We fitted a harmonic regression model to part of the
us_gasolineseries in Exercise 6 in Section 7.10. We will now revisit this model, and extend it to include more data and ARMA errors.
- Using
TSLM(), fit a harmonic regression with a piecewise linear time trend to the fullgasolineseries. Select the position of the knots in the trend and the appropriate number of Fourier terms to include by minimising the AICc or CV value.
Let’s optimize using 2 break points and an unknown number of Fourier terms. Because the number of Fourier terms is integer, we can’t just use optim. Instead, we will loop over a large number of possible values for the breakpoints and Fourier terms. There are more than 2000 models fitted here, but TSLM is relatively fast.
Note that the possible values of the knots must be restricted so that knot2 is always much larger than knot1. We have set them to be at least 2 years apart here.
us_gasoline |> autoplot(Barrels)
# Function to compute CV given K and knots.
get_cv <- function(K, knot1, knot2) {
us_gasoline |>
model(TSLM(Barrels ~ fourier(K = K) + trend(c(knot1, knot2)))) |>
glance() |>
pull(CV)
}
models <- expand.grid(
K = seq(25),
knot1 = yearweek(as.character(seq(1991, 2017, 2))),
knot2 = yearweek(as.character(seq(1991, 2017, 2)))
) |>
filter(knot2 - knot1 > 104) |>
as_tibble()
models <- models |>
mutate(cv = purrr::pmap_dbl(models, get_cv)) |>
arrange(cv)
# Best combination
(best <- head(models, 1))
## # A tibble: 1 × 4
## K knot1 knot2 cv
## <int> <week> <week> <dbl>
## 1 6 2007 W01 2013 W01 0.0641
fit <- us_gasoline |>
model(
TSLM(Barrels ~ fourier(K = best$K) + trend(c(best$knot1, best$knot2)))
)
- Now refit the model using
ARIMA()to allow for correlated errors, keeping the same predictor variables as you used withTSLM().
fit <- us_gasoline |>
model(ARIMA(Barrels ~ fourier(K = best$K) + trend(c(best$knot1, best$knot2)) + PDQ(0, 0, 0)))
fit |> report()
## Series: Barrels
## Model: LM w/ ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 fourier(K = best$K)C1_52 fourier(K = best$K)S1_52
## 0.9277 -0.8414 -0.1144 -0.2306
## s.e. 0.0256 0.0357 0.0133 0.0132
## fourier(K = best$K)C2_52 fourier(K = best$K)S2_52
## 0.0418 0.0309
## s.e. 0.0105 0.0105
## fourier(K = best$K)C3_52 fourier(K = best$K)S3_52
## 0.0836 0.0343
## s.e. 0.0097 0.0097
## fourier(K = best$K)C4_52 fourier(K = best$K)S4_52
## 0.0187 0.0399
## s.e. 0.0094 0.0094
## fourier(K = best$K)C5_52 fourier(K = best$K)S5_52
## -0.0315 0.0011
## s.e. 0.0092 0.0092
## fourier(K = best$K)C6_52 fourier(K = best$K)S6_52
## -0.0523 0.0001
## s.e. 0.0092 0.0092
## trend(c(best$knot1, best$knot2))trend
## 0.0028
## s.e. 0.0001
## trend(c(best$knot1, best$knot2))trend_831
## -0.0051
## s.e. 0.0002
## trend(c(best$knot1, best$knot2))trend_1144 intercept
## 0.0055 7.1065
## s.e. 0.0006 0.0352
##
## sigma^2 estimated as 0.06051: log likelihood=-13.38
## AIC=64.76 AICc=65.33 BIC=163.78
- Check the residuals of the final model using the
gg_tsdisplay()function and a Ljung-Box test. Do they look sufficiently like white noise to continue? If not, try modifying your model, or removing the first few years of data.
gg_tsresiduals(fit)
augment(fit) |> features(.innov, ljung_box, dof = 2, lag = 26)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 "ARIMA(Barrels ~ fourier(K = best$K) + trend(c(best$knot1, … 25.8 0.365
Usually, we choose lag to be twice the seasonal period, but the seasonal period here is about 52, and 104 lags is too many for the test statistics to have good properties. So we’ve set lag to be 26 which should be plenty.
The model looks pretty good, and passes the Ljung-Box test, although there is some heteroskedasticity.
- Once you have a model with white noise residuals, produce forecasts for the next year.
fit |>
forecast(h = "1 year") |>
autoplot(us_gasoline)
Electricity consumption is often modelled as a function of temperature. Temperature is measured by daily heating degrees and cooling degrees. Heating degrees is \(18^\circ\)C minus the average daily temperature when the daily average is below \(18^\circ\)C; otherwise it is zero. This provides a measure of our need to heat ourselves as temperature falls. Cooling degrees measures our need to cool ourselves as the temperature rises. It is defined as the average daily temperature minus \(18^\circ\)C when the daily average is above \(18^\circ\)C; otherwise it is zero. Let \(y_t\) denote the monthly total of kilowatt-hours of electricity used, let \(x_{1,t}\) denote the monthly total of heating degrees, and let \(x_{2,t}\) denote the monthly total of cooling degrees.
An analyst fits the following model to a set of such data: \[y^*_t = \beta_1x^*_{1,t} + \beta_2x^*_{2,t} + \eta_t,\] where \[(1-\Phi_{1}B^{12} - \Phi_{2}B^{24})(1-B)(1-B^{12})\eta_t = (1+\theta_1 B)\varepsilon_t\] and \(y^*_t = \log(y_t)\), \(x^*_{1,t} = \sqrt{x_{1,t}}\) and \(x^*_{2,t}=\sqrt{x_{2,t}}\).
- What sort of ARIMA model is identified for \(\eta_t\)?
ARIMA(0,1,1)(2,1,0)\(_{12}\)
- The estimated coefficients are
Parameter Estimate s.e. \(Z\) \(P\)-value \(\beta_1\) 0.0077 0.0015 4.98 0.000 \(\beta_2\) 0.0208 0.0023 9.23 0.000 \(\theta_1\) 0.5830 0.0720 8.10 0.000 \(\Phi_{1}\) -0.5373 0.0856 -6.27 0.000 \(\Phi_{2}\) -0.4667 0.0862 -5.41 0.000 Explain what the estimates of \(\beta_1\) and \(\beta_2\) tell us about electricity consumption.
\(\beta_1\) is the unit increase in \(y_t^*\) when \(x_{1,t}^*\) increases by 1. This is a little hard to interpret, but it is clear that monthly total electricity usage goes up when monthly heating degrees goes up. Similarly, for \(\beta_2\), monthly total electricity usage goes up when monthly cooling degrees goes up.
- Write the equation in a form more suitable for forecasting.
This turned out to be way more messy than I expected, but for what it’s worth, here it is in all its ugliness.
First apply the differences to the regression equation. \[ (1-B)(1-B^{12}) y_t^* = \beta_1^*(1-B)(1-B^{12})x_{1,t}^* + \beta_2^*(1-B)(1-B^{12})x_{2,t}^* + (1-B)(1-B^{12})\eta_{t} \] So \[ (y^*_{t} - y^*_{t-1} - y^*_{t-12} +y^*_{t-13}) = \beta_1(x^*_{1,t} - x^*_{1,t-1} - x^*_{1,t-12} + x^*_{1,t-13}) + \beta_2(x^*_{2,t} - x^*_{2,t-1} - x^*_{2,t-12} + x^*_{2,t-13}) + \eta'_t \] Multiplying by the AR polynomial gives \[\begin{align*} (y^*_{t} & - y^*_{t-1} - y^*_{t-12} +y^*_{t-13}) -\Phi_{1}(y^*_{t-12} - y^*_{t-13} - y^*_{t-24} +y^*_{t-25}) -\Phi_{2}(y^*_{t-24} - y^*_{t-25} - y^*_{t-36} +y^*_{t-37})\\ = & ~ \beta_1(x^*_{1,t} - x^*_{1,t-1} - x^*_{1,t-12} + x^*_{1,t-13}) -\Phi_{1}\beta_1(x^*_{1,t-12} - x^*_{1,t-13} - x^*_{1,t-24} + x^*_{1,t-25}) -\Phi_{2}\beta_1(x^*_{1,t-24} - x^*_{1,t-25} - x^*_{1,t-36} + x^*_{1,t-37}) \\ & + \beta_2(x^*_{2,t} - x^*_{2,t-1} - x^*_{2,t-12} + x^*_{2,t-13}) - \Phi_{1}\beta_2(x^*_{2,t-12} - x^*_{2,t-13} - x^*_{2,t-24} + x^*_{2,t-25}) - \Phi_{2}\beta_2(x^*_{2,t-24} - x^*_{2,t-25} - x^*_{2,t-36} + x^*_{2,t-37}) \\ & + \varepsilon_t + \theta_1 \varepsilon_{t-1}. \end{align*}\] Finally, we move all but \(y_t^*\) to the right hand side: \[\begin{align*} y^*_{t} = & ~ y^*_{t-1} + y^*_{t-12} - y^*_{t-13} +\Phi_{1}(y^*_{t-12} - y^*_{t-13} - y^*_{t-24} +y^*_{t-25}) +\Phi_{2}(y^*_{t-24} - y^*_{t-25} - y^*_{t-36} +y^*_{t-37})\\ & + \beta_1(x^*_{1,t} - x^*_{1,t-1} - x^*_{1,t-12} + x^*_{1,t-13}) -\Phi_{1}\beta_1(x^*_{1,t-12} - x^*_{1,t-13} - x^*_{1,t-24} + x^*_{1,t-25}) -\Phi_{2}\beta_1(x^*_{1,t-24} - x^*_{1,t-25} - x^*_{1,t-36} + x^*_{1,t-37}) \\ & + \beta_2(x^*_{2,t} - x^*_{2,t-1} - x^*_{2,t-12} + x^*_{2,t-13}) - \Phi_{1}\beta_2(x^*_{2,t-12} - x^*_{2,t-13} - x^*_{2,t-24} + x^*_{2,t-25}) - \Phi_{2}\beta_2(x^*_{2,t-24} - x^*_{2,t-25} - x^*_{2,t-36} + x^*_{2,t-37}) \\ & + \varepsilon_t + \theta_1 \varepsilon_{t-1}. \end{align*}\]
- Describe how this model could be used to forecast electricity demand for the next 12 months.
For \(t=T+1\), we use the above equation to find a point forecast of \(y_{T+1}^*\), setting \(\varepsilon_{T+1}=0\) and \(\hat{\varepsilon}_T\) to the last residual. The actual \(y_t^*\) values are all known, as are all the \(x_{1,t}^*\) and \(x_{2,t}^*\) values up to time \(t=T\). For \(x_{1,T+1}^*\) and \(x_{2,T+1}^*\), we could use a forecast (for example, a seasonal naive forecast).
For \(t=T+2,\dots,T+12\), we do something similar, but both \(\varepsilon\) values are set to 0 and \(y^*_{T+k}\) (\(k\ge1\)) is replaced by the forecasts just calculated.
- Explain why the \(\eta_t\) term should be modelled with an ARIMA model rather than modelling the data using a standard regression package. In your discussion, comment on the properties of the estimates, the validity of the standard regression results, and the importance of the \(\eta_t\) model in producing forecasts.
For the retail time series considered in earlier chapters:
- Develop an appropriate dynamic regression model with Fourier terms for the seasonality. Use the AIC to select the number of Fourier terms to include in the model. (You will probably need to use the same Box-Cox transformation you identified previously.)
set.seed(12345678)
myseries <- aus_retail |>
filter(
`Series ID` == sample(aus_retail$`Series ID`, 1),
Month < yearmonth("2018 Jan")
)
myseries |> features(Turnover, guerrero)
## # A tibble: 1 × 3
## State Industry lambda_guerrero
## <chr> <chr> <dbl>
## 1 Northern Territory Clothing, footwear and personal accessory … 0.0776
myseries |> autoplot(log(Turnover))
fit <- myseries |>
model(
`K=1` = ARIMA(log(Turnover) ~ trend() + fourier(K = 1) +
pdq(0:2, 0, 0:2) + PDQ(0:1, 0, 0:1)),
`K=2` = ARIMA(log(Turnover) ~ trend() + fourier(K = 2) +
pdq(0:2, 0, 0:2) + PDQ(0:1, 0, 0:1)),
`K=3` = ARIMA(log(Turnover) ~ trend() + fourier(K = 3) +
pdq(0:2, 0, 0:2) + PDQ(0:1, 0, 0:1)),
`K=4` = ARIMA(log(Turnover) ~ trend() + fourier(K = 4) +
pdq(0:2, 0, 0:2) + PDQ(0:1, 0, 0:1)),
`K=5` = ARIMA(log(Turnover) ~ trend() + fourier(K = 5) +
pdq(0:2, 0, 0:2) + PDQ(0:1, 0, 0:1)),
`K=6` = ARIMA(log(Turnover) ~ trend() + fourier(K = 6) +
pdq(0:2, 0, 0:2) + PDQ(0:1, 0, 0:1))
)
glance(fit)
## # A tibble: 6 × 10
## State Industry .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 Northern … Clothin… K=1 0.00664 383. -748. -748. -713. <cpl> <cpl>
## 2 Northern … Clothin… K=2 0.00652 389. -756. -755. -713. <cpl> <cpl>
## 3 Northern … Clothin… K=3 0.00626 400. -774. -773. -723. <cpl> <cpl>
## 4 Northern … Clothin… K=4 0.00596 411. -792. -791. -734. <cpl> <cpl>
## 5 Northern … Clothin… K=5 0.00480 453. -873. -872. -811. <cpl> <cpl>
## 6 Northern … Clothin… K=6 0.00437 470. -906. -905. -841. <cpl> <cpl>
Including 6 harmonics minimises the AICc (and AIC/BIC) for this series.
fit <- transmute(fit, best = `K=6`)
report(fit)
## Series: Turnover
## Model: LM w/ ARIMA(1,0,1)(1,0,0)[12] errors
## Transformation: log(Turnover)
##
## Coefficients:
## ar1 ma1 sar1 trend() fourier(K = 6)C1_12
## 0.9632 -0.3755 0.1761 0.0041 -0.0809
## s.e. 0.0165 0.0502 0.0542 0.0006 0.0080
## fourier(K = 6)S1_12 fourier(K = 6)C2_12 fourier(K = 6)S2_12
## -0.1258 0.0381 -0.0882
## s.e. 0.0080 0.0052 0.0052
## fourier(K = 6)C3_12 fourier(K = 6)S3_12 fourier(K = 6)C4_12
## -0.0206 -0.0815 -0.0294
## s.e. 0.0045 0.0045 0.0042
## fourier(K = 6)S4_12 fourier(K = 6)C5_12 fourier(K = 6)S5_12
## -0.0538 -0.0554 -0.0540
## s.e. 0.0042 0.0041 0.0041
## fourier(K = 6)C6_12 intercept
## -0.0230 1.3317
## s.e. 0.0029 0.1231
##
## sigma^2 estimated as 0.004368: log likelihood=470.23
## AIC=-906.46 AICc=-904.65 BIC=-840.54
The chosen model is a linear trend (will be exponential after back-transforming) and fourier terms with 5 harmonics. The error model is ARIMA(1,0,1)(1,0,1).
- Check the residuals of the fitted model. Does the residual series look like white noise?
gg_tsresiduals(fit)
The residuals look well behaved.
- Compare the forecasts with those you obtained earlier using alternative models.
fit <- myseries |>
model(
dynamic = ARIMA(log(Turnover) ~ trend() + fourier(K = 6) +
pdq(0:2, 0, 0:2) + PDQ(0:1, 0, 0:1)),
arima = ARIMA(log(Turnover)),
ets = ETS(Turnover)
)
fit |>
forecast() |>
autoplot(filter(myseries, year(Month) > 2010), level = 80, alpha = 0.5)